Discovering Conditional Functional Dependencies to Detect Data Inconsistencies
نویسندگان
چکیده
Poor quality data is a growing and costly problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an approach that efficiently and robustly discovers conditional functional dependencies for detecting inconsistencies in data and hence improves data quality. We evaluate our approach empirically on three real-world data sets, and show that our approach performs well on these sets across several dimensions such as precision, recall, and runtime. We also compare our approach to an established solution and show that our approach outperforms this solution across the same dimensions. Finally, we describe efforts to deploy our approach as part of an enterprise tool being developed at Accenture to accelerate data quality efforts such as data profiling and cleansing.
منابع مشابه
Discovering Data Quality Rules in a Master Data Management
Dirty data continues to be an important issue for companies. The datawarehouse institute [Eckerson, 2002], [Rockwell, 2012] stated poor data costs US businesses $611 billion dollars annually and erroneously priced data in retail databases costs US customers $2.5 billion each year. Data quality becomes more and more critical. The database community pays a particular attention to this subject whe...
متن کاملDiscovering Conditional Functional Dependencies in XML Data
XML data inconsistency has become a serious problem since XML was widely adopted as a standard for data representation on the web. XML-based standards such as OASIS, xCBL and xBRL have been used to report and exchange business and financial information. Such standards focus on technical rather than semantic aspects. XML Functional Dependencies (XFDs) have been introduced to improve XML semantic...
متن کاملApproximation Measures for Conditional Functional Dependencies Using Stripped Conditional Partitions
Received Apr 11, 2017 Revised May 5, 2017 Accepted May 24, 2017 Conditional functional dependencies (CFDs) have been used to improve the quality of data, including detecting and repairing data inconsistencies. Approximation measures have significant importance for data dependencies in data mining. To adapt to exceptions in real data, the measures are used to relax the strictness of CFDs for mor...
متن کاملRègles d’Edition: Fouille et Application au Nettoyage de Données
Dirty data is a serious problem for businesses, leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. A variety of integrity constraints like Conditional Functional Dependencies (CFD) have been studied for data cleaning. Data repairing methods based on these constraints are strong to detect inconsistencies but are limited on how to corre...
متن کاملDiscover Dependencies from Data - A Review
Functional and inclusion dependency discovery is important to knowledge discovery, database semantics analysis, database design, and data quality assessment. Motivated by the importance of dependency discovery, this paper reviews the methods for functional dependency, conditional functional dependency, approximate functional dependency and inclusion dependency discovery in relational databases ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010